Overview

Dataset statistics

Number of variables25
Number of observations30000
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory5.7 MiB
Average record size in memory200.0 B

Variable types

NUM22
CAT2
BOOL1

Reproduction

Analysis started2020-07-02 17:58:35.908895
Analysis finished2020-07-02 18:00:04.408372
Duration1 minute and 28.5 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

BILL_AMT2 is highly correlated with BILL_AMT1 and 1 other fieldsHigh correlation
BILL_AMT1 is highly correlated with BILL_AMT2High correlation
BILL_AMT3 is highly correlated with BILL_AMT2 and 1 other fieldsHigh correlation
BILL_AMT4 is highly correlated with BILL_AMT3 and 2 other fieldsHigh correlation
BILL_AMT5 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
BILL_AMT6 is highly correlated with BILL_AMT4 and 1 other fieldsHigh correlation
PAY_AMT2 is highly skewed (γ1 = 30.45381745) Skewed
ID has unique values Unique
PAY_0 has 14737 (49.1%) zeros Zeros
PAY_2 has 15730 (52.4%) zeros Zeros
PAY_3 has 15764 (52.5%) zeros Zeros
PAY_4 has 16455 (54.9%) zeros Zeros
PAY_5 has 16947 (56.5%) zeros Zeros
PAY_6 has 16286 (54.3%) zeros Zeros
BILL_AMT1 has 2008 (6.7%) zeros Zeros
BILL_AMT2 has 2506 (8.4%) zeros Zeros
BILL_AMT3 has 2870 (9.6%) zeros Zeros
BILL_AMT4 has 3195 (10.7%) zeros Zeros
BILL_AMT5 has 3506 (11.7%) zeros Zeros
BILL_AMT6 has 4020 (13.4%) zeros Zeros
PAY_AMT1 has 5249 (17.5%) zeros Zeros
PAY_AMT2 has 5396 (18.0%) zeros Zeros
PAY_AMT3 has 5968 (19.9%) zeros Zeros
PAY_AMT4 has 6408 (21.4%) zeros Zeros
PAY_AMT5 has 6703 (22.3%) zeros Zeros
PAY_AMT6 has 7173 (23.9%) zeros Zeros

Variables

ID
Real number (ℝ≥0)

UNIQUE

Distinct count30000
Unique (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15000.5
Minimum1
Maximum30000
Zeros0
Zeros (%)0.0%
Memory size234.4 KiB

Quantile statistics

Minimum1
5-th percentile1500.95
Q17500.75
median15000.5
Q322500.25
95-th percentile28500.05
Maximum30000
Range29999
Interquartile range (IQR)14999.5

Descriptive statistics

Standard deviation8660.398374
Coefficient of variation (CV)0.5773406469
Kurtosis-1.2
Mean15000.5
Median Absolute Deviation (MAD)7500
Skewness0
Sum450015000
Variance75002500
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
20471< 0.1%
 
13221< 0.1%
 
156291< 0.1%
 
94861< 0.1%
 
115351< 0.1%
 
217921< 0.1%
 
238411< 0.1%
 
176981< 0.1%
 
197471< 0.1%
 
299881< 0.1%
 
Other values (29990)29990> 99.9%
 
ValueCountFrequency (%) 
11< 0.1%
 
21< 0.1%
 
31< 0.1%
 
41< 0.1%
 
51< 0.1%
 
ValueCountFrequency (%) 
300001< 0.1%
 
299991< 0.1%
 
299981< 0.1%
 
299971< 0.1%
 
299961< 0.1%
 

LIMIT_BAL
Real number (ℝ≥0)

Distinct count81
Unique (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean167484.32266666667
Minimum10000.0
Maximum1000000.0
Zeros0
Zeros (%)0.0%
Memory size234.4 KiB

Quantile statistics

Minimum10000
5-th percentile20000
Q150000
median140000
Q3240000
95-th percentile430000
Maximum1000000
Range990000
Interquartile range (IQR)190000

Descriptive statistics

Standard deviation129747.6616
Coefficient of variation (CV)0.7746854124
Kurtosis0.5362628964
Mean167484.3227
Median Absolute Deviation (MAD)90000
Skewness0.9928669605
Sum5024529680
Variance1.683445568e+10
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50000336511.2%
 
2000019766.6%
 
3000016105.4%
 
8000015675.2%
 
20000015285.1%
 
15000011103.7%
 
10000010483.5%
 
1800009953.3%
 
3600008812.9%
 
600008252.8%
 
Other values (71)1509550.3%
 
ValueCountFrequency (%) 
100004931.6%
 
160002< 0.1%
 
2000019766.6%
 
3000016105.4%
 
400002300.8%
 
ValueCountFrequency (%) 
10000001< 0.1%
 
8000002< 0.1%
 
7800002< 0.1%
 
7600001< 0.1%
 
7500004< 0.1%
 

SEX
Categorical

Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.4 KiB
2
18112
1
11888
ValueCountFrequency (%) 
21811260.4%
 
11188839.6%
 

Length

Max length1
Median length1
Mean length1
Min length1

EDUCATION
Real number (ℝ≥0)

Distinct count7
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.8531333333333333
Minimum0
Maximum6
Zeros14
Zeros (%)< 0.1%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q32
95-th percentile3
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation0.7903486597
Coefficient of variation (CV)0.426493143
Kurtosis2.078621603
Mean1.853133333
Median Absolute Deviation (MAD)1
Skewness0.9709720486
Sum55594
Variance0.6246510039
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
21403046.8%
 
11058535.3%
 
3491716.4%
 
52800.9%
 
41230.4%
 
6510.2%
 
014< 0.1%
 
ValueCountFrequency (%) 
014< 0.1%
 
11058535.3%
 
21403046.8%
 
3491716.4%
 
41230.4%
 
ValueCountFrequency (%) 
6510.2%
 
52800.9%
 
41230.4%
 
3491716.4%
 
21403046.8%
 

MARRIAGE
Categorical

Distinct count4
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.4 KiB
2
15964
1
13659
3
 
323
0
 
54
ValueCountFrequency (%) 
21596453.2%
 
11365945.5%
 
33231.1%
 
0540.2%
 

Length

Max length1
Median length1
Mean length1
Min length1

AGE
Real number (ℝ≥0)

Distinct count56
Unique (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.4855
Minimum21
Maximum79
Zeros0
Zeros (%)0.0%
Memory size234.4 KiB

Quantile statistics

Minimum21
5-th percentile23
Q128
median34
Q341
95-th percentile53
Maximum79
Range58
Interquartile range (IQR)13

Descriptive statistics

Standard deviation9.217904068
Coefficient of variation (CV)0.2597653709
Kurtosis0.04430337824
Mean35.4855
Median Absolute Deviation (MAD)6
Skewness0.7322458688
Sum1064565
Variance84.96975541
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
2916055.3%
 
2714774.9%
 
2814094.7%
 
3013954.7%
 
2612564.2%
 
3112174.1%
 
2511864.0%
 
3411623.9%
 
3211583.9%
 
3311463.8%
 
Other values (46)1698956.6%
 
ValueCountFrequency (%) 
21670.2%
 
225601.9%
 
239313.1%
 
2411273.8%
 
2511864.0%
 
ValueCountFrequency (%) 
791< 0.1%
 
753< 0.1%
 
741< 0.1%
 
734< 0.1%
 
723< 0.1%
 

PAY_0
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.0167
Minimum-2
Maximum8
Zeros14737
Zeros (%)49.1%
Memory size234.4 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.123801528
Coefficient of variation (CV)-67.29350467
Kurtosis2.720715042
Mean-0.0167
Median Absolute Deviation (MAD)1
Skewness0.7319749269
Sum-501
Variance1.262929874
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01473749.1%
 
-1568619.0%
 
1368812.3%
 
-227599.2%
 
226678.9%
 
33221.1%
 
4760.3%
 
5260.1%
 
8190.1%
 
611< 0.1%
 
ValueCountFrequency (%) 
-227599.2%
 
-1568619.0%
 
01473749.1%
 
1368812.3%
 
226678.9%
 
ValueCountFrequency (%) 
8190.1%
 
79< 0.1%
 
611< 0.1%
 
5260.1%
 
4760.3%
 

PAY_2
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.13376666666666667
Minimum-2
Maximum8
Zeros15730
Zeros (%)52.4%
Memory size234.4 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.197185973
Coefficient of variation (CV)-8.949807922
Kurtosis1.57041773
Mean-0.1337666667
Median Absolute Deviation (MAD)0
Skewness0.7905650222
Sum-4013
Variance1.433254254
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01573052.4%
 
-1605020.2%
 
2392713.1%
 
-2378212.6%
 
33261.1%
 
4990.3%
 
1280.1%
 
5250.1%
 
7200.1%
 
612< 0.1%
 
ValueCountFrequency (%) 
-2378212.6%
 
-1605020.2%
 
01573052.4%
 
1280.1%
 
2392713.1%
 
ValueCountFrequency (%) 
81< 0.1%
 
7200.1%
 
612< 0.1%
 
5250.1%
 
4990.3%
 

PAY_3
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.1662
Minimum-2
Maximum8
Zeros15764
Zeros (%)52.5%
Memory size234.4 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.196867568
Coefficient of variation (CV)-7.201369245
Kurtosis2.084435875
Mean-0.1662
Median Absolute Deviation (MAD)0
Skewness0.8406818269
Sum-4986
Variance1.432491976
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01576452.5%
 
-1593819.8%
 
-2408513.6%
 
2381912.7%
 
32400.8%
 
4760.3%
 
7270.1%
 
6230.1%
 
5210.1%
 
14< 0.1%
 
ValueCountFrequency (%) 
-2408513.6%
 
-1593819.8%
 
01576452.5%
 
14< 0.1%
 
2381912.7%
 
ValueCountFrequency (%) 
83< 0.1%
 
7270.1%
 
6230.1%
 
5210.1%
 
4760.3%
 

PAY_4
Real number (ℝ)

ZEROS

Distinct count11
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.22066666666666668
Minimum-2
Maximum8
Zeros16455
Zeros (%)54.9%
Memory size234.4 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.169138622
Coefficient of variation (CV)-5.29821128
Kurtosis3.496983496
Mean-0.2206666667
Median Absolute Deviation (MAD)0
Skewness0.9996294133
Sum-6620
Variance1.366885118
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01645554.9%
 
-1568719.0%
 
-2434814.5%
 
2315910.5%
 
31800.6%
 
4690.2%
 
7580.2%
 
5350.1%
 
65< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
-2434814.5%
 
-1568719.0%
 
01645554.9%
 
12< 0.1%
 
2315910.5%
 
ValueCountFrequency (%) 
82< 0.1%
 
7580.2%
 
65< 0.1%
 
5350.1%
 
4690.2%
 

PAY_5
Real number (ℝ)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2662
Minimum-2
Maximum8
Zeros16947
Zeros (%)56.5%
Memory size234.4 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.133187406
Coefficient of variation (CV)-4.256902352
Kurtosis3.989748144
Mean-0.2662
Median Absolute Deviation (MAD)0
Skewness1.008197025
Sum-7986
Variance1.284113697
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01694756.5%
 
-1553918.5%
 
-2454615.2%
 
226268.8%
 
31780.6%
 
4840.3%
 
7580.2%
 
5170.1%
 
64< 0.1%
 
81< 0.1%
 
ValueCountFrequency (%) 
-2454615.2%
 
-1553918.5%
 
01694756.5%
 
226268.8%
 
31780.6%
 
ValueCountFrequency (%) 
81< 0.1%
 
7580.2%
 
64< 0.1%
 
5170.1%
 
4840.3%
 

PAY_6
Real number (ℝ)

ZEROS

Distinct count10
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.2911
Minimum-2
Maximum8
Zeros16286
Zeros (%)54.3%
Memory size234.4 KiB

Quantile statistics

Minimum-2
5-th percentile-2
Q1-1
median0
Q30
95-th percentile2
Maximum8
Range10
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.149987626
Coefficient of variation (CV)-3.950489954
Kurtosis3.42653413
Mean-0.2911
Median Absolute Deviation (MAD)0
Skewness0.9480293916
Sum-8733
Variance1.322471539
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
01628654.3%
 
-1574019.1%
 
-2489516.3%
 
227669.2%
 
31840.6%
 
4490.2%
 
7460.2%
 
6190.1%
 
513< 0.1%
 
82< 0.1%
 
ValueCountFrequency (%) 
-2489516.3%
 
-1574019.1%
 
01628654.3%
 
227669.2%
 
31840.6%
 
ValueCountFrequency (%) 
82< 0.1%
 
7460.2%
 
6190.1%
 
513< 0.1%
 
4490.2%
 

BILL_AMT1
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count22723
Unique (%)75.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean51223.3309
Minimum-165580.0
Maximum964511.0
Zeros2008
Zeros (%)6.7%
Memory size234.4 KiB

Quantile statistics

Minimum-165580
5-th percentile0
Q13558.75
median22381.5
Q367091
95-th percentile201203.05
Maximum964511
Range1130091
Interquartile range (IQR)63532.25

Descriptive statistics

Standard deviation73635.86058
Coefficient of variation (CV)1.437545339
Kurtosis9.806289341
Mean51223.3309
Median Absolute Deviation (MAD)21800.5
Skewness2.663861022
Sum1536699927
Variance5422239963
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
020086.7%
 
3902440.8%
 
780760.3%
 
326720.2%
 
316630.2%
 
2500590.2%
 
396490.2%
 
2400390.1%
 
416290.1%
 
1050250.1%
 
Other values (22713)2733691.1%
 
ValueCountFrequency (%) 
-1655801< 0.1%
 
-1549731< 0.1%
 
-153081< 0.1%
 
-143861< 0.1%
 
-115451< 0.1%
 
ValueCountFrequency (%) 
9645111< 0.1%
 
7468141< 0.1%
 
6530621< 0.1%
 
6304581< 0.1%
 
6266481< 0.1%
 

BILL_AMT2
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count22346
Unique (%)74.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49179.07516666667
Minimum-69777.0
Maximum983931.0
Zeros2506
Zeros (%)8.4%
Memory size234.4 KiB

Quantile statistics

Minimum-69777
5-th percentile0
Q12984.75
median21200
Q364006.25
95-th percentile194792.2
Maximum983931
Range1053708
Interquartile range (IQR)61021.5

Descriptive statistics

Standard deviation71173.76878
Coefficient of variation (CV)1.447236829
Kurtosis10.30294592
Mean49179.07517
Median Absolute Deviation (MAD)20810
Skewness2.705220853
Sum1475372255
Variance5065705363
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
025068.4%
 
3902310.8%
 
780750.2%
 
326750.2%
 
316720.2%
 
2500510.2%
 
396510.2%
 
2400420.1%
 
-200290.1%
 
416280.1%
 
Other values (22336)2684089.5%
 
ValueCountFrequency (%) 
-697771< 0.1%
 
-675261< 0.1%
 
-333501< 0.1%
 
-300001< 0.1%
 
-262141< 0.1%
 
ValueCountFrequency (%) 
9839311< 0.1%
 
7439701< 0.1%
 
6715631< 0.1%
 
6467701< 0.1%
 
6244751< 0.1%
 

BILL_AMT3
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count22026
Unique (%)73.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean47013.1548
Minimum-157264.0
Maximum1664089.0
Zeros2870
Zeros (%)9.6%
Memory size234.4 KiB

Quantile statistics

Minimum-157264
5-th percentile0
Q12666.25
median20088.5
Q360164.75
95-th percentile187821.05
Maximum1664089
Range1821353
Interquartile range (IQR)57498.5

Descriptive statistics

Standard deviation69349.38743
Coefficient of variation (CV)1.475106015
Kurtosis19.78325514
Mean47013.1548
Median Absolute Deviation (MAD)19708.5
Skewness3.087830046
Sum1410394644
Variance4809337537
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
028709.6%
 
3902750.9%
 
780740.2%
 
326630.2%
 
316620.2%
 
396480.2%
 
2500400.1%
 
2400390.1%
 
416290.1%
 
200270.1%
 
Other values (22016)2647388.2%
 
ValueCountFrequency (%) 
-1572641< 0.1%
 
-615061< 0.1%
 
-461271< 0.1%
 
-340411< 0.1%
 
-254431< 0.1%
 
ValueCountFrequency (%) 
16640891< 0.1%
 
8550861< 0.1%
 
6931311< 0.1%
 
6896431< 0.1%
 
6896271< 0.1%
 

BILL_AMT4
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count21548
Unique (%)71.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43262.94896666666
Minimum-170000.0
Maximum891586.0
Zeros3195
Zeros (%)10.7%
Memory size234.4 KiB

Quantile statistics

Minimum-170000
5-th percentile0
Q12326.75
median19052
Q354506
95-th percentile174333.35
Maximum891586
Range1061586
Interquartile range (IQR)52179.25

Descriptive statistics

Standard deviation64332.85613
Coefficient of variation (CV)1.487019671
Kurtosis11.30932483
Mean43262.94897
Median Absolute Deviation (MAD)18656
Skewness2.821965291
Sum1297888469
Variance4138716378
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0319510.7%
 
3902460.8%
 
7801010.3%
 
316680.2%
 
326620.2%
 
396440.1%
 
150390.1%
 
2400390.1%
 
2500340.1%
 
416330.1%
 
Other values (21538)2613987.1%
 
ValueCountFrequency (%) 
-1700001< 0.1%
 
-813341< 0.1%
 
-651671< 0.1%
 
-506161< 0.1%
 
-466271< 0.1%
 
ValueCountFrequency (%) 
8915861< 0.1%
 
7068641< 0.1%
 
6286991< 0.1%
 
6168361< 0.1%
 
5728051< 0.1%
 

BILL_AMT5
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count21010
Unique (%)70.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean40311.40096666667
Minimum-81334.0
Maximum927171.0
Zeros3506
Zeros (%)11.7%
Memory size234.4 KiB

Quantile statistics

Minimum-81334
5-th percentile0
Q11763
median18104.5
Q350190.5
95-th percentile165794.3
Maximum927171
Range1008505
Interquartile range (IQR)48427.5

Descriptive statistics

Standard deviation60797.15577
Coefficient of variation (CV)1.508187617
Kurtosis12.30588129
Mean40311.40097
Median Absolute Deviation (MAD)17688.5
Skewness2.876379867
Sum1209342029
Variance3696294150
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0350611.7%
 
3902350.8%
 
780940.3%
 
316790.3%
 
326620.2%
 
150580.2%
 
396470.2%
 
2400390.1%
 
2500370.1%
 
416360.1%
 
Other values (21000)2580786.0%
 
ValueCountFrequency (%) 
-813341< 0.1%
 
-613721< 0.1%
 
-530071< 0.1%
 
-466271< 0.1%
 
-375941< 0.1%
 
ValueCountFrequency (%) 
9271711< 0.1%
 
8235401< 0.1%
 
5870671< 0.1%
 
5517021< 0.1%
 
5478801< 0.1%
 

BILL_AMT6
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct count20604
Unique (%)68.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean38871.7604
Minimum-339603.0
Maximum961664.0
Zeros4020
Zeros (%)13.4%
Memory size234.4 KiB

Quantile statistics

Minimum-339603
5-th percentile0
Q11256
median17071
Q349198.25
95-th percentile161912
Maximum961664
Range1301267
Interquartile range (IQR)47942.25

Descriptive statistics

Standard deviation59554.10754
Coefficient of variation (CV)1.53206613
Kurtosis12.27070529
Mean38871.7604
Median Absolute Deviation (MAD)16755
Skewness2.846644576
Sum1166152812
Variance3546691724
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0402013.4%
 
3902070.7%
 
780860.3%
 
150780.3%
 
316770.3%
 
326560.2%
 
396450.1%
 
416360.1%
 
-18330.1%
 
2400320.1%
 
Other values (20594)2533084.4%
 
ValueCountFrequency (%) 
-3396031< 0.1%
 
-2090511< 0.1%
 
-1509531< 0.1%
 
-946251< 0.1%
 
-738951< 0.1%
 
ValueCountFrequency (%) 
9616641< 0.1%
 
6999441< 0.1%
 
5686381< 0.1%
 
5277111< 0.1%
 
5275661< 0.1%
 

PAY_AMT1
Real number (ℝ≥0)

ZEROS

Distinct count7943
Unique (%)26.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5663.5805
Minimum0.0
Maximum873552.0
Zeros5249
Zeros (%)17.5%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11000
median2100
Q35006
95-th percentile18428.2
Maximum873552
Range873552
Interquartile range (IQR)4006

Descriptive statistics

Standard deviation16563.28035
Coefficient of variation (CV)2.924524575
Kurtosis415.2547427
Mean5663.5805
Median Absolute Deviation (MAD)1932
Skewness14.66836433
Sum169907415
Variance274342256.1
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0524917.5%
 
200013634.5%
 
30008913.0%
 
50006982.3%
 
15005071.7%
 
40004261.4%
 
100004011.3%
 
10003651.2%
 
25002981.0%
 
60002941.0%
 
Other values (7933)1950865.0%
 
ValueCountFrequency (%) 
0524917.5%
 
19< 0.1%
 
214< 0.1%
 
3150.1%
 
4180.1%
 
ValueCountFrequency (%) 
8735521< 0.1%
 
5050001< 0.1%
 
4933581< 0.1%
 
4239031< 0.1%
 
4050161< 0.1%
 

PAY_AMT2
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct count7899
Unique (%)26.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5921.1635
Minimum0.0
Maximum1684259.0
Zeros5396
Zeros (%)18.0%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1833
median2009
Q35000
95-th percentile19004.35
Maximum1684259
Range1684259
Interquartile range (IQR)4167

Descriptive statistics

Standard deviation23040.8704
Coefficient of variation (CV)3.891274139
Kurtosis1641.631911
Mean5921.1635
Median Absolute Deviation (MAD)1991
Skewness30.45381745
Sum177634905
Variance530881708.9
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0539618.0%
 
200012904.3%
 
30008572.9%
 
50007172.4%
 
10005942.0%
 
15005211.7%
 
40004101.4%
 
100003181.1%
 
60002830.9%
 
25002510.8%
 
Other values (7889)1936364.5%
 
ValueCountFrequency (%) 
0539618.0%
 
1150.1%
 
2200.1%
 
3180.1%
 
411< 0.1%
 
ValueCountFrequency (%) 
16842591< 0.1%
 
12270821< 0.1%
 
12154711< 0.1%
 
10245161< 0.1%
 
5804641< 0.1%
 

PAY_AMT3
Real number (ℝ≥0)

ZEROS

Distinct count7518
Unique (%)25.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5225.6815
Minimum0.0
Maximum896040.0
Zeros5968
Zeros (%)19.9%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1390
median1800
Q34505
95-th percentile17589.4
Maximum896040
Range896040
Interquartile range (IQR)4115

Descriptive statistics

Standard deviation17606.96147
Coefficient of variation (CV)3.36931393
Kurtosis564.3112295
Mean5225.6815
Median Absolute Deviation (MAD)1795
Skewness17.21663544
Sum156770445
Variance310005092.2
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0596819.9%
 
200012854.3%
 
100011033.7%
 
30008702.9%
 
50007212.4%
 
15004901.6%
 
40003811.3%
 
100003121.0%
 
12002430.8%
 
60002410.8%
 
Other values (7508)1838661.3%
 
ValueCountFrequency (%) 
0596819.9%
 
113< 0.1%
 
2190.1%
 
314< 0.1%
 
4150.1%
 
ValueCountFrequency (%) 
8960401< 0.1%
 
8890431< 0.1%
 
5082291< 0.1%
 
4175881< 0.1%
 
4009721< 0.1%
 

PAY_AMT4
Real number (ℝ≥0)

ZEROS

Distinct count6937
Unique (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4826.076866666666
Minimum0.0
Maximum621000.0
Zeros6408
Zeros (%)21.4%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1296
median1500
Q34013.25
95-th percentile16014.95
Maximum621000
Range621000
Interquartile range (IQR)3717.25

Descriptive statistics

Standard deviation15666.15974
Coefficient of variation (CV)3.246147995
Kurtosis277.3337677
Mean4826.076867
Median Absolute Deviation (MAD)1500
Skewness12.90498482
Sum144782306
Variance245428561.1
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0640821.4%
 
100013944.6%
 
200012144.0%
 
30008873.0%
 
50008102.7%
 
15004411.5%
 
40004021.3%
 
100003411.1%
 
25002590.9%
 
5002580.9%
 
Other values (6927)1758658.6%
 
ValueCountFrequency (%) 
0640821.4%
 
1220.1%
 
2220.1%
 
313< 0.1%
 
4200.1%
 
ValueCountFrequency (%) 
6210001< 0.1%
 
5288971< 0.1%
 
4970001< 0.1%
 
4321301< 0.1%
 
4000461< 0.1%
 

PAY_AMT5
Real number (ℝ≥0)

ZEROS

Distinct count6897
Unique (%)23.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4799.387633333334
Minimum0.0
Maximum426529.0
Zeros6703
Zeros (%)22.3%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1252.5
median1500
Q34031.5
95-th percentile16000
Maximum426529
Range426529
Interquartile range (IQR)3779

Descriptive statistics

Standard deviation15278.30568
Coefficient of variation (CV)3.183386475
Kurtosis180.0639402
Mean4799.387633
Median Absolute Deviation (MAD)1500
Skewness11.12741705
Sum143981629
Variance233426624.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0670322.3%
 
100013404.5%
 
200013234.4%
 
30009473.2%
 
50008142.7%
 
15004261.4%
 
40004011.3%
 
100003431.1%
 
5002500.8%
 
60002470.8%
 
Other values (6887)1720657.4%
 
ValueCountFrequency (%) 
0670322.3%
 
1210.1%
 
213< 0.1%
 
313< 0.1%
 
412< 0.1%
 
ValueCountFrequency (%) 
4265291< 0.1%
 
4179901< 0.1%
 
3880711< 0.1%
 
3792671< 0.1%
 
3320001< 0.1%
 

PAY_AMT6
Real number (ℝ≥0)

ZEROS

Distinct count6939
Unique (%)23.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5215.502566666667
Minimum0.0
Maximum528666.0
Zeros7173
Zeros (%)23.9%
Memory size234.4 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1117.75
median1500
Q34000
95-th percentile17343.8
Maximum528666
Range528666
Interquartile range (IQR)3882.25

Descriptive statistics

Standard deviation17777.46578
Coefficient of variation (CV)3.408581541
Kurtosis167.1614296
Mean5215.502567
Median Absolute Deviation (MAD)1500
Skewness10.64072733
Sum156465077
Variance316038289.4
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
0717323.9%
 
100012994.3%
 
200012954.3%
 
30009143.0%
 
50008082.7%
 
15004391.5%
 
40004111.4%
 
100003561.2%
 
5002470.8%
 
60002200.7%
 
Other values (6929)1683856.1%
 
ValueCountFrequency (%) 
0717323.9%
 
1200.1%
 
29< 0.1%
 
314< 0.1%
 
412< 0.1%
 
ValueCountFrequency (%) 
5286661< 0.1%
 
5271431< 0.1%
 
4430011< 0.1%
 
4220001< 0.1%
 
4035001< 0.1%
 
Distinct count2
Unique (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size234.4 KiB
0
23364
1
6636
ValueCountFrequency (%) 
02336477.9%
 
1663622.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
0120000.02212422-1-1-2-23913.03102.0689.00.00.00.00.0689.00.00.00.00.01
12120000.022226-1200022682.01725.02682.03272.03455.03261.00.01000.01000.01000.00.02000.01
2390000.02223400000029239.014027.013559.014331.014948.015549.01518.01500.01000.01000.01000.05000.00
3450000.02213700000046990.048233.049291.028314.028959.029547.02000.02019.01200.01100.01069.01000.00
4550000.012157-10-10008617.05670.035835.020940.019146.019131.02000.036681.010000.09000.0689.0679.00
5650000.01123700000064400.057069.057608.019394.019619.020024.02500.01815.0657.01000.01000.0800.00
67500000.011229000000367965.0412023.0445007.0542653.0483003.0473944.055000.040000.038000.020239.013750.013770.00
78100000.0222230-1-100-111876.0380.0601.0221.0-159.0567.0380.0601.00.0581.01687.01542.00
89140000.02312800200011285.014096.012108.012211.011793.03719.03329.00.0432.01000.01000.01000.00
91020000.013235-2-2-2-2-1-10.00.00.00.013007.013912.00.00.00.013007.01122.00.00

Last rows

IDLIMIT_BALSEXEDUCATIONMARRIAGEAGEPAY_0PAY_2PAY_3PAY_4PAY_5PAY_6BILL_AMT1BILL_AMT2BILL_AMT3BILL_AMT4BILL_AMT5BILL_AMT6PAY_AMT1PAY_AMT2PAY_AMT3PAY_AMT4PAY_AMT5PAY_AMT6default.payment.next.month
2999029991140000.012141000000138325.0137142.0139110.0138262.049675.046121.06000.07000.04228.01505.02000.02000.00
2999129992210000.0121343222222500.02500.02500.02500.02500.02500.00.00.00.00.00.00.01
299922999310000.013143000-2-2-28802.010400.00.00.00.00.02000.00.00.00.00.00.00
2999329994100000.0112380-1-10003042.01427.0102996.070626.069473.055004.02000.0111784.04000.03000.02000.02000.00
299942999580000.01223422222272557.077708.079384.077519.082607.081158.07000.03500.00.07000.00.04000.01
2999529996220000.013139000000188948.0192815.0208365.088004.031237.015980.08500.020000.05003.03047.05000.01000.00
2999629997150000.013243-1-1-1-1001683.01828.03502.08979.05190.00.01837.03526.08998.0129.00.00.00
299972999830000.012237432-1003565.03356.02758.020878.020582.019357.00.00.022000.04200.02000.03100.01
299982999980000.0131411-1000-1-1645.078379.076304.052774.011855.048944.085900.03409.01178.01926.052964.01804.01
299993000050000.01214600000047929.048905.049764.036535.032428.015313.02078.01800.01430.01000.01000.01000.01